Project 2#
Francesca Michielli (fcm2118)
November 22, 2024
This project looks at NYC shootings rate (defined as number of shootings per 1,000 residents) by police precinct from 2018 - 2023
Data on shootings can be found here
Population data can be found here
Police precinct boundaries can be found here
1. Import and Clean Shootings Data#
import plotly.io as pio
pio.renderers.default = "vscode+jupyterlab+notebook_connected"
import pandas as pd
import plotly.express as px
The shootings data below is a list of every shooting incident in NYC from 2006 - 2023. Each observation has an incident key, occurrence date, location (including precinct), and other details about the incident.
Shootings data from here
Downloaded 11/14/24
shootings = pd.read_csv("NYPD_Shooting_Incident_Data.csv")
The following commands allow us to preview the data:
shootings.head()
| INCIDENT_KEY | OCCUR_DATE | OCCUR_TIME | BORO | LOC_OF_OCCUR_DESC | PRECINCT | JURISDICTION_CODE | LOC_CLASSFCTN_DESC | LOCATION_DESC | STATISTICAL_MURDER_FLAG | ... | PERP_SEX | PERP_RACE | VIC_AGE_GROUP | VIC_SEX | VIC_RACE | X_COORD_CD | Y_COORD_CD | Latitude | Longitude | Lon_Lat | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 244608249 | 05/05/2022 | 00:10:00 | MANHATTAN | INSIDE | 14 | 0.0 | COMMERCIAL | VIDEO STORE | True | ... | M | BLACK | 25-44 | M | BLACK | 986050.000 | 214231.000000 | 40.754692 | -73.993500 | POINT (-73.9935 40.754692) |
| 1 | 247542571 | 07/04/2022 | 22:20:00 | BRONX | OUTSIDE | 48 | 0.0 | STREET | (null) | True | ... | (null) | (null) | 18-24 | M | BLACK | 1016802.000 | 250581.000000 | 40.854402 | -73.882330 | POINT (-73.88233 40.854402) |
| 2 | 84967535 | 05/27/2012 | 19:35:00 | QUEENS | NaN | 103 | 0.0 | NaN | NaN | False | ... | NaN | NaN | 18-24 | M | BLACK | 1048632.000 | 198262.000000 | 40.710634 | -73.767773 | POINT (-73.76777349199995 40.71063412500007) |
| 3 | 202853370 | 09/24/2019 | 21:00:00 | BRONX | NaN | 42 | 0.0 | NaN | NaN | False | ... | M | UNKNOWN | 25-44 | M | BLACK | 1014493.000 | 242565.000000 | 40.832417 | -73.890714 | POINT (-73.89071440599997 40.832416753000075) |
| 4 | 27078636 | 02/25/2007 | 21:00:00 | BROOKLYN | NaN | 83 | 0.0 | NaN | NaN | False | ... | M | BLACK | 25-44 | M | BLACK | 1009149.375 | 190104.703125 | 40.688443 | -73.910219 | POINT (-73.91021857399994 40.68844345900004) |
5 rows × 21 columns
shootings.count()
INCIDENT_KEY 28562
OCCUR_DATE 28562
OCCUR_TIME 28562
BORO 28562
LOC_OF_OCCUR_DESC 2966
PRECINCT 28562
JURISDICTION_CODE 28560
LOC_CLASSFCTN_DESC 2966
LOCATION_DESC 13585
STATISTICAL_MURDER_FLAG 28562
PERP_AGE_GROUP 19218
PERP_SEX 19252
PERP_RACE 19252
VIC_AGE_GROUP 28562
VIC_SEX 28562
VIC_RACE 28562
X_COORD_CD 28562
Y_COORD_CD 28562
Latitude 28503
Longitude 28503
Lon_Lat 28503
dtype: int64
shootings.dtypes
INCIDENT_KEY int64
OCCUR_DATE object
OCCUR_TIME object
BORO object
LOC_OF_OCCUR_DESC object
PRECINCT int64
JURISDICTION_CODE float64
LOC_CLASSFCTN_DESC object
LOCATION_DESC object
STATISTICAL_MURDER_FLAG bool
PERP_AGE_GROUP object
PERP_SEX object
PERP_RACE object
VIC_AGE_GROUP object
VIC_SEX object
VIC_RACE object
X_COORD_CD float64
Y_COORD_CD float64
Latitude float64
Longitude float64
Lon_Lat object
dtype: object
The following code creates a column that represents the year of the incident:
shootings['Year'] = pd.DatetimeIndex(shootings['OCCUR_DATE']).year
shootings.head()
| INCIDENT_KEY | OCCUR_DATE | OCCUR_TIME | BORO | LOC_OF_OCCUR_DESC | PRECINCT | JURISDICTION_CODE | LOC_CLASSFCTN_DESC | LOCATION_DESC | STATISTICAL_MURDER_FLAG | ... | PERP_RACE | VIC_AGE_GROUP | VIC_SEX | VIC_RACE | X_COORD_CD | Y_COORD_CD | Latitude | Longitude | Lon_Lat | Year | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 244608249 | 05/05/2022 | 00:10:00 | MANHATTAN | INSIDE | 14 | 0.0 | COMMERCIAL | VIDEO STORE | True | ... | BLACK | 25-44 | M | BLACK | 986050.000 | 214231.000000 | 40.754692 | -73.993500 | POINT (-73.9935 40.754692) | 2022 |
| 1 | 247542571 | 07/04/2022 | 22:20:00 | BRONX | OUTSIDE | 48 | 0.0 | STREET | (null) | True | ... | (null) | 18-24 | M | BLACK | 1016802.000 | 250581.000000 | 40.854402 | -73.882330 | POINT (-73.88233 40.854402) | 2022 |
| 2 | 84967535 | 05/27/2012 | 19:35:00 | QUEENS | NaN | 103 | 0.0 | NaN | NaN | False | ... | NaN | 18-24 | M | BLACK | 1048632.000 | 198262.000000 | 40.710634 | -73.767773 | POINT (-73.76777349199995 40.71063412500007) | 2012 |
| 3 | 202853370 | 09/24/2019 | 21:00:00 | BRONX | NaN | 42 | 0.0 | NaN | NaN | False | ... | UNKNOWN | 25-44 | M | BLACK | 1014493.000 | 242565.000000 | 40.832417 | -73.890714 | POINT (-73.89071440599997 40.832416753000075) | 2019 |
| 4 | 27078636 | 02/25/2007 | 21:00:00 | BROOKLYN | NaN | 83 | 0.0 | NaN | NaN | False | ... | BLACK | 25-44 | M | BLACK | 1009149.375 | 190104.703125 | 40.688443 | -73.910219 | POINT (-73.91021857399994 40.68844345900004) | 2007 |
5 rows × 22 columns
Next I create a new dataframe that groups the shootings by precinct and year.
This gives a count of the number of shootings in each precinct in each year
shootings_grouped = shootings.groupby(["PRECINCT","Year"])["INCIDENT_KEY"].count().reset_index()
shootings_grouped.head()
| PRECINCT | Year | INCIDENT_KEY | |
|---|---|---|---|
| 0 | 1 | 2007 | 1 |
| 1 | 1 | 2008 | 1 |
| 2 | 1 | 2009 | 1 |
| 3 | 1 | 2010 | 8 |
| 4 | 1 | 2012 | 2 |
Rename the column to “Shooting Count”
shootings_grouped = shootings_grouped.rename(columns={'INCIDENT_KEY':'Shooting Count'})
shootings_grouped.head()
| PRECINCT | Year | Shooting Count | |
|---|---|---|---|
| 0 | 1 | 2007 | 1 |
| 1 | 1 | 2008 | 1 |
| 2 | 1 | 2009 | 1 |
| 3 | 1 | 2010 | 8 |
| 4 | 1 | 2012 | 2 |
Upon investigation, it appears that there are some precinct-year combinations that are not in the dataset. This is because there are some precincts that have zero shootings in a given year. To fill out the dataset with the missing values, I created a new dataframe of every year-precinct combination and merged the current dataframe to that new dataframe.
Note that I found this issue after mapping the data and seeing blank precincts on the map. I looked at the data and found that not all precincts had reported shootings in every year. Thus, these precint-year observations do not show up in the collapsed data.
First, get a list of all years and precincts:
years = shootings_grouped["Year"].unique()
years.sort()
precincts = shootings_grouped["PRECINCT"].unique()
precincts.sort()
Next, create a list of every year-precinct combination:
combinations = []
for i in years:
for n in precincts:
combinations.append((n,i))
combinations[:10]
[(1, 2006),
(5, 2006),
(6, 2006),
(7, 2006),
(9, 2006),
(10, 2006),
(13, 2006),
(14, 2006),
(17, 2006),
(18, 2006)]
Then, turn this list into a dataframe:
all_observations = pd.DataFrame(combinations, columns=['Precinct', 'Year'])
all_observations.head()
| Precinct | Year | |
|---|---|---|
| 0 | 1 | 2006 |
| 1 | 5 | 2006 |
| 2 | 6 | 2006 |
| 3 | 7 | 2006 |
| 4 | 9 | 2006 |
Merge the original dataframe onto this new dataframe with the complete list of observations:
all_observations_comb = pd.merge(
all_observations,
shootings_grouped,
left_on=["Precinct", "Year"],
right_on=["PRECINCT", "Year"],
how = "left")
all_observations_comb.head()
| Precinct | Year | PRECINCT | Shooting Count | |
|---|---|---|---|---|
| 0 | 1 | 2006 | NaN | NaN |
| 1 | 5 | 2006 | 5.0 | 3.0 |
| 2 | 6 | 2006 | 6.0 | 1.0 |
| 3 | 7 | 2006 | 7.0 | 8.0 |
| 4 | 9 | 2006 | 9.0 | 3.0 |
all_observations_comb = all_observations_comb.drop("PRECINCT", axis=1)
Replace all the shooting counts that have “NaN” with 0:
all_observations_comb["Shooting Count"] = all_observations_comb["Shooting Count"].fillna(0).astype(int)
all_observations_comb
| Precinct | Year | Shooting Count | |
|---|---|---|---|
| 0 | 1 | 2006 | 0 |
| 1 | 5 | 2006 | 3 |
| 2 | 6 | 2006 | 1 |
| 3 | 7 | 2006 | 8 |
| 4 | 9 | 2006 | 3 |
| ... | ... | ... | ... |
| 1381 | 115 | 2023 | 6 |
| 1382 | 120 | 2023 | 25 |
| 1383 | 121 | 2023 | 2 |
| 1384 | 122 | 2023 | 2 |
| 1385 | 123 | 2023 | 2 |
1386 rows × 3 columns
2. Import and Clean Population Data#
The data below shows population by police precint:
link = "nyc_precinct_2020pop.csv"
population = pd.read_csv(link)
population.head()
| precinct | P1_001N | P1_002N | P1_003N | P1_004N | P1_005N | P1_006N | P1_007N | P1_008N | P1_009N | ... | P2_064N | P2_065N | P2_066N | P2_067N | P2_068N | P2_069N | P2_070N | P2_071N | P2_072N | P2_073N | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 84799 | 76937 | 57501 | 2655 | 96 | 14624 | 55 | 2006 | 7862 | ... | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 5 | 50598 | 47663 | 13363 | 3556 | 109 | 27507 | 69 | 3059 | 2935 | ... | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2 | 6 | 64643 | 59481 | 50320 | 1782 | 72 | 5635 | 34 | 1638 | 5162 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 3 | 7 | 57985 | 51746 | 18416 | 6934 | 396 | 17779 | 46 | 8175 | 6239 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 4 | 9 | 75951 | 68112 | 41843 | 6610 | 480 | 11322 | 84 | 7773 | 7839 | ... | 0 | 3 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
5 rows × 145 columns
Clean the data by renaming the appropriate column to ‘Total Population’ and creating a new dataframe with only the two relevant columns:
population = population.rename(columns={'P1_001N':'Total Population'})
pop = population[['precinct','Total Population']]
pop
| precinct | Total Population | |
|---|---|---|
| 0 | 1 | 84799 |
| 1 | 5 | 50598 |
| 2 | 6 | 64643 |
| 3 | 7 | 57985 |
| 4 | 9 | 75951 |
| ... | ... | ... |
| 72 | 115 | 179134 |
| 73 | 120 | 122308 |
| 74 | 121 | 128149 |
| 75 | 122 | 144552 |
| 76 | 123 | 100738 |
77 rows × 2 columns
3. Merge the Shootings Data with Population Data#
Merge the two datasets on precinct
Calculate the shooting rate per precinct
merged = pd.merge(all_observations_comb,
pop,
left_on=["Precinct"],
right_on=["precinct"])
merged
| Precinct | Year | Shooting Count | precinct | Total Population | |
|---|---|---|---|---|---|
| 0 | 1 | 2006 | 0 | 1 | 84799 |
| 1 | 5 | 2006 | 3 | 5 | 50598 |
| 2 | 6 | 2006 | 1 | 6 | 64643 |
| 3 | 7 | 2006 | 8 | 7 | 57985 |
| 4 | 9 | 2006 | 3 | 9 | 75951 |
| ... | ... | ... | ... | ... | ... |
| 1381 | 115 | 2023 | 6 | 115 | 179134 |
| 1382 | 120 | 2023 | 25 | 120 | 122308 |
| 1383 | 121 | 2023 | 2 | 121 | 128149 |
| 1384 | 122 | 2023 | 2 | 122 | 144552 |
| 1385 | 123 | 2023 | 2 | 123 | 100738 |
1386 rows × 5 columns
Calculate the shooting rate per precinct per year (shootings per 1000 residents)
merged['shooting_rate'] = merged['Shooting Count']/merged['Total Population']*1000
merged
| Precinct | Year | Shooting Count | precinct | Total Population | shooting_rate | |
|---|---|---|---|---|---|---|
| 0 | 1 | 2006 | 0 | 1 | 84799 | 0.000000 |
| 1 | 5 | 2006 | 3 | 5 | 50598 | 0.059291 |
| 2 | 6 | 2006 | 1 | 6 | 64643 | 0.015470 |
| 3 | 7 | 2006 | 8 | 7 | 57985 | 0.137967 |
| 4 | 9 | 2006 | 3 | 9 | 75951 | 0.039499 |
| ... | ... | ... | ... | ... | ... | ... |
| 1381 | 115 | 2023 | 6 | 115 | 179134 | 0.033494 |
| 1382 | 120 | 2023 | 25 | 120 | 122308 | 0.204402 |
| 1383 | 121 | 2023 | 2 | 121 | 128149 | 0.015607 |
| 1384 | 122 | 2023 | 2 | 122 | 144552 | 0.013836 |
| 1385 | 123 | 2023 | 2 | 123 | 100738 | 0.019853 |
1386 rows × 6 columns
4. Import Boundaries for Police Precincts#
Police Precinct boundaries in geojson form found here
import requests
response = requests.get("https://data.cityofnewyork.us/resource/kmub-vria.geojson")
shapes = response.json()
print("loaded")
loaded
shapes["features"][0]["properties"]
{'precinct': '1', 'shape_leng': '80093.5344285', 'shape_area': '47285006.2592'}
5. Map Shootings per 1,000 residents by Precinct#
Create a function to map the shootings rate by precinct for a given year
def plot_nyc(df, year):
""" This function makes a chloropleth map of NYC,
using a DataFrame with a precinct and a shootings_per_capita column"""
fig = px.choropleth_map(
df,
locations="precinct",
color="shooting_rate",
geojson=shapes,
featureidkey="properties.precinct",
center={"lat": 40.71, "lon": -73.98},
zoom=9,
height=600,
title= (f"Shootings per 1,000 Residents Across Precinct: {year}"),
range_color=(0, 1),
labels = {"shooting_rate":"Shootings per 1,000"}
)
fig.show()
plot_nyc(merged[merged["Year"]==2020], 2020)
Exclude central park because it is an outlier:
plot_nyc(merged[(merged["precinct"] != 22)
& (merged["Year"]==2020)], 2020)
Plot maps for years 2018 - 2023
This allows us to visualize the change in shooting rates per precinct over time
for i in [2018, 2020, 2023]:
plot_nyc(merged[(merged["precinct"] != 22)
& (merged["Year"]==i)], i)
Takeaways#
These maps show how the intensity of shootings, measured as the number of shootings per 1,000 in each police precinct, changed over time. As can be seen in the maps, the blue and purple districts have relatively lower shooting rates, while the yellow and orange districts have higher rates.
These maps indicate that the intensity of shootings increased in 2020 and 2021 compared to 2018 and 2019, then fell in 2023.
Shootings are primarily concentrated in parts of Brooklyn, the Bronx, and Upper Manhattan, but didn’t impact each area at the same time. In other words, 2020 was worse for Brooklyn, while 2021 and 2022 was worse for the South Bronx and upper Manhattan.
These maps highlight a link between the onset of the Covid pandemic and gun violence in New York City.
There are many more ideas to be explored here. For instance, this analysis does not indicate the severity of shootings, nor does it tell us about the different racial or economic groups who are impacted by shootings.